Draft: control nonce space and timeouts for all all chip topologies by warioishere · Pull Request #546 · shufps/ESP-Miner-NerdQAxePlus

warioishere · 2026-04-04T11:28:05Z

Summary

Port of Bitaxe ESP-Miner PR 420 - dynamic Hash Counting Number (HCN) calculation for BM1366/BM1368/BM1370 ASICs.

Required for SV2 Standard Channel support where no extranonce is available and the ASIC must search the full nonce space.

Changes

Added setNonceSpace(frequency, asic_count, cores) to Asic base class
Added getCoreCount() to each ASIC subclass (BM1366=112, BM1368=80, BM1370=128)
Replaced setVrFrequency(vrFrequency) with setNonceSpace() in each init()
Formula: HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5

Open question: Register 0x10 conflict

NerdQAxePlus uses register 0x10 for Version Rolling Frequency (vrFreqToReg), while ESP-Miner uses it for Hash Counting Number. These produce very different values:

Purpose	NerdQAxe++ (4x BM1370 @ 615MHz)	Register value
VR Frequency (25kHz)	`VR_REG_PER_HZ / 25000`	~7,864
HCN (PR #420)	`(2^32/128/4) * 25/615 * 0.5`	~170,993

Need review: Can register 0x10 serve both purposes? Does the HCN calculation implicitly set the correct VR timing? Or do we need both writes?

cc @shufps @mutatrum @adammwest for review of the register 0x10 semantics.

Test plan

V1 mining still works with new HCN values
Version rolling still functions correctly
SV2 Standard Channel can search full nonce space

Replaces static setVrFrequency with computed Hash Counting Number (HCN) on register 0x10, based on bitaxeorg/ESP-Miner PR#420. Formula: HCN = (2^32 / next_pow2(cores) / next_pow2(asics)) * FREQ_MULT / freq * 0.5 Core counts: BM1366=112, BM1368=80, BM1370=128 NOTE: NerdQAxePlus previously used register 0x10 for Version Rolling Frequency (vrFreqToReg). This change replaces that with the HCN value. The interaction between VR frequency and HCN on register 0x10 needs review - they produce very different values for the same register.

shufps · 2026-04-05T09:41:18Z

Hi,

Need review: Can register 0x10 serve both purposes? Does the HCN calculation implicitly set the correct VR timing? Or do we need both writes?

I experimentally verified that the 0x10 sets the version rolling frequency by logging nonce wrap around times and how fast the version counter advances.

Not sure what Bitaxe thinks this register is for.

And I don't see (yet) why changing the 0x10 might be necessary vor SV2.

The only thing I found out what makes tweaking 0x10 necessary is when the ASIC clock is so high that the search nonce wraps around in the search space before the version counter is incremented, leading to duplicate shares.

Increasing the VR-frequency fixes it then.

I only observed that by ASIC frequencies of hmm around 1100MHz and higher

warioishere · 2026-04-05T09:54:58Z

Thanks for the clarification! That makes sense for V1/Extended Channel where new work arrives every ~500ms via extranonce_2 increment.

The reason we opened this is for SV2 Standard Channel (from our SV2 PR #544):

In Standard Channel, the pool provides the complete block header - the miner has no extranonce to increment. The miner must rely entirely on nonce (32-bit) + version rolling to find shares. New work only arrives when the pool sends a new job, which is typically every 30-60 seconds depending on pool template settings.

I just verified this on hardware: the miner produces duplicate shares within seconds in Standard Channel mode. The current VR frequency setting causes the ASIC to exhaust its search space too quickly.

So the question is: can we adjust register 0x10 to give the ASIC enough nonce + version rolling search space for 30-60 seconds of autonomous mining?

That's also why we tagged @mutatrum and @adammwest - we're not ASIC firmware experts and wanted their input on the register 0x10 semantics.

warioishere · 2026-04-05T09:59:17Z

Some concrete numbers on why this matters for SV2 Standard Channel:

Total search space with full nonce (2^32) + version rolling (0xFFFF = 2^16):

2^48 ≈ 281.5 TH

Device	Hashrate	Time to exhaust full search space
NerdQAxe+	2.5 TH/s	~113 seconds
NerdQAxe++	4.8 TH/s	~59 seconds
NerdOCTAXE	9.0 TH/s	~31 seconds

With pool template intervals of 30-60 seconds, only the NerdQAxe+ would be safe if the full search space is utilized. The NerdQAxe++ and NerdOCTAXE would additionally need ntime rolling to avoid exhausting the search space between templates.

Current problem: The VR frequency at 25 kHz cycles through all 65536 versions in ~2.6 seconds, regardless of whether all nonces per version have been checked. That's why I'm seeing duplicate shares within seconds on hardware.

The register 0x10 value needs to be tuned so that the version rolling rate matches the ASIC's actual nonce scanning speed - ensuring the full 2^48 search space is covered before wrapping around. And for devices above ~2.5 TH/s, ntime rolling support would be needed on top.

shufps · 2026-04-05T10:02:59Z

So the question is: can we adjust register 0x10 to give the ASIC enough nonce + version rolling search space for 30-60 seconds of autonomous mining?

IMHO no, I often read something about partitioning and how people believe it's working but I couldn't confirm any of that and I always saw it as some big misunderstanding.

Partitioning works via the chip ID and some people think that distributing the chip ID evenly across the possible 127 IDs, like Bitaxe is doing it, makes it use more search space automagically but this is imho a misconception and I never saw anything that would confirm that it's like that.

This is clearly visible when checking the chip ID of the nonce, it's always the ID that is set during the initialization and not some ID in between of two ASICs.

And the 0x10 needs some balancing between ASIC frequency, version rolling frequency and job interval times.

Just using 0x10 to try to give more time won't work and especially not in 1.5s+ range.

warioishere · 2026-04-05T10:06:42Z

@shufps fair points about partitioning - I'm not claiming to understand the ASIC internals better than you do.

But the empirical evidence is clear: on the Bitaxe with the same BM1368/BM1370 ASICs, SV2 Standard Channel works with the register 0x10 change from Bitaxe ESP-Miner PR 420. Without it, duplicate shares within seconds. With it, stable mining. Same ASIC, same pool, same protocol.

That PR was written by @adammwest who has deep knowledge of these ASICs. I'd really value his input here on what register 0x10 actually controls and why the HCN calculation makes Standard Channel work on Bitaxe hardware.

If there's a fundamental difference in how NerdQAxePlus configures the ASICs vs Bitaxe that makes this approach not work here, I'd like to understand that too. Happy to do more testing on hardware to figure this out.

shufps · 2026-04-05T10:08:10Z

If there's a fundamental difference in how NerdQAxePlus configures the ASICs vs Bitaxe that makes this approach not work here, I'd like to understand that too. Happy to do more testing on hardware to figure this out.

The main difference is how the chip IDs are set - maybe I'm wrong and everything works completely different than I figured out lol ... Would really surprise me because the picture that I have mentally is very consistent with all I have learned within the last 2 years 😂

warioishere · 2026-04-05T10:11:38Z

@shufps totally respect your 2 years of experience here! And the chip ID difference could well be a factor.

But I can reproduce this directly on my Bitaxe (same BM1368 and 1370 ASIC):

Bitaxe firmware WITHOUT PR 420: SV2 Standard Channel → duplicate shares within seconds, same behavior as on NerdQAxePlus right now
Bitaxe firmware WITH PR 420: SV2 Standard Channel → no duplicate shares, stable mining, search space is extended

Same device, same pool, same ASIC - just the register 0x10 value changes. So whatever the HCN calculation does to that register, it measurably extends the search space on the Bitaxe.

Would be interesting to test the same register value on NerdQAxePlus and see if the chip ID difference matters or not.

shufps · 2026-04-05T10:13:31Z

Would be interesting to test the same register value on NerdQAxePlus and see if the chip ID difference matters or not.

I guess the best test would be to just copy how they set the chip IDs.

Would be a couple of changed lines. I guess any AI could do that in 5 minutes.

But there really is nothing else I could imagine that could make a difference / be different.

warioishere · 2026-04-05T10:15:17Z

Good idea 👍 We'll run some tests - both with the Bitaxe-style chip ID setup and with the HCN register value - and report back with the results.

mutatrum · 2026-04-05T10:22:02Z

There might be something in the middle. What surprised me a long time ago is that with Bitaxe PR 420, the nonce space exhausts in the same time no matter the frequency. TBF, this needs to be re-verified as it's from over a year ago, but that would mean Bitaxe also misses a component in the init somewhere. Maybe we both have a partial picture, that both work but is not the complete picture.

- Chip ID distribution: 256 / chip_counter (instead of hardcoded 2/4) - m_addressInterval member for consistent chip addressing - Nonce-to-ASIC mapping: (bswap32(nonce) >> 17) / address_interval - chipIndexFromAddr uses address_interval (removed BM1370 override) - All per-chip CMD_WRITE_SINGLE use address_interval - Disabled checkVrFrequencyChanged (overwrites HCN on register 0x10)

warioishere · 2026-04-05T17:35:15Z

@shufps here are our test results - you were right about the chip IDs being the key factor:

Test Results (NerdQAxe++, 4x BM1370 @ 615MHz, SV2 Standard Channel)

Test 1: HCN on register 0x10, original chip IDs (0,4,8,12)

Result: All 4 ASICs hash at full speed (~5 TH/s). V1 shares accepted, no regression. But in Standard Channel: first few unique shares accepted, then duplicates within seconds. Each nonce reported 4x simultaneously - all 4 chips find the same nonce because they search the same nonce space.

Test 2: Same as Test 1 + disabled `checkVrFrequencyChanged` (was overwriting register 0x10)

Result: Same duplicate pattern. Confirmed the VR frequency overwrite was a problem, but HCN value alone doesn't partition the nonce space between chips.

Test 3: HCN + Bitaxe-style chip IDs (0,64,128,192) + updated nonce-to-ASIC mapping

Result: All 4 ASICs hash, shares accepted, no duplicate shares, stable mining on Standard Channel! Chip ID distribution is the key to nonce partitioning.

Changes needed:

address_interval = 256 / chip_counter (instead of hardcoded 2 or 4)
All per-chip CMD_WRITE_SINGLE commands use i * address_interval
Nonce-to-ASIC mapping: ((bswap32(nonce) >> 17) & 0xff) / address_interval
chipIndexFromAddr: addr / address_interval (removed BM1370 override that used addr >> 2)
Register 0x10: HCN value from setNonceSpace() instead of VR frequency
checkVrFrequencyChanged disabled (was overwriting HCN on register 0x10)
HCN recalculated on ASIC frequency change

This also works for multi-chip boards like OCTAXE (8 chips → address_interval=32).

Open questions

VR frequency feature

checkVrFrequencyChanged writes to the same register 0x10 that HCN uses. Currently disabled to prevent overwriting HCN. Should we remove the VR frequency UI feature entirely, or is there a way to combine both?

NerdOCTAXE and NerdQaxe++ needs ntime rolling

The full nonce + version rolling search space at 4.8th/sec or 9 TH/s lasts ~31/59sec seconds. Most pools send new templates every 30-60 seconds. For the NerdQaxe++ and OCTAXE (and future faster devices), we need ntime rolling to avoid exhausting the search space between templates. Our plan: increment ntime every 5 seconds, giving enough headroom for overclocking and future higher-hashrate boards. With 60s template intervals that's max 12 ntime increments - well within consenus tolerance.

Calculates how long the ASIC needs to exhaust the full nonce+version search space. Used to determine when ntime needs to be incremented for Standard Channel on multi-chip boards.

warioishere · 2026-04-05T21:12:23Z

Update on the ntime rolling from our last post: instead of a fixed 5 second interval we went with a dynamic approach. We ported calculate_bm_timeout_ms() from ESP-Miner PR 420 which calculates how long the search space actually lasts based on frequency, cores and ASIC count. We roll ntime at 80% exhaustion.

Tested on both NerdQAxe++ and NerdOCTAXE-Gamma with SV2 Standard Channel, no duplicates:

I (59205) mining_info_v2: ntime roll #1: ntime=1775422187 (search space 25.0s exhausted)

NerdQAxe++ rolls at ~46s (80% of ~57s), OCTAXE at ~25s (80% of ~31s). The counter resets on each new pool template so you mostly just see "#1" with 30s template intervals.

This can't go into this PR because it depends on code from both this PR and our SV2 PR (#544). Will follow as a separate PR once both are merged, code is ready on our test branch.

https://github.com/warioishere/ESP-Miner-NerdQAxePlus/tree/test/sv2-nonce-space-v2

VR frequency question still open.

shufps · 2026-04-06T05:49:20Z

Hmm interesting ... so it seems the nonce space is really automagically evenly partitioned between the chip IDs - I always thought that wouldn't happen but it seems I was wrong all that time^^

But there is another problem, the dual pool scheduling relies on jobs being switched in short time like 500ms.

Letting a job run for 30s or so somewhat breaks this.

Btw do we need to support Standard?

Pragmatic approach would be just to use Exended -> voila problem solved.

wdyt?

warioishere · 2026-04-06T17:19:33Z

On the Dual Pool concern - the chip ID and HCN changes sit on the ASIC driver level but job switching is controlled higher up. V1 and Extended keep sending new work every 500ms regardless of HCN, Dual Pool not affected. Standard Channel has a m_jobSent flag that stops resending, and Dual Pool is already blocked for Standard Channel in the UI anyway.

About whether we need Standard Channels - actually yes, ideally we should support them. The SV2 spec designed Standard Channels for end-mining devices doing Header-Only Mining. They just get a ready Merkle Root and hash, no coinbase/extranonce handling needed. Extended Channels are actually meant for proxies, not end devices. When a miner opens an Extended Channel directly to a pool it's essentially doing the proxy's job - computing coinbase hashes, walking merkle paths, managing extranonce. Works fine but it's a workaround.

The real use case for standard channels: someone running a JDC (Job Declarator Client) for template control. The JDC acts as alocal proxy, opens an Extended Channel upstream and feeds Standard Jobs to downstream miners. Our devices could connect to the JDC via Standard Channel and just hash headers - that's the intended SV2 architecture to also work with a own JD-Client in between to controll your own blocktemplate. The sv2 guys are already doing some great work hier to set this up quiet easily:

https://github.com/stratum-mining/sv2-ui

For connecting directly to a pool without proxy, Extended is the practical choice. But supporting both means the devices works best in both scenarios. And Dual Pool stays disabled for Standard Channel, no conflict there.

shufps · 2026-04-06T18:10:18Z

For connecting directly to a pool without proxy, Extended is the practical choice. But supporting both means the devices works best in both scenarios. And Dual Pool stays disabled for Standard Channel, no conflict there.

Hmm interesting, thx for your explanation.

But chances to get it accepted and merged might be a lot lower with Standard Channel support and all the required changes for it 😉

But it’s really weird. It’s like the Standard Channel was invented completely out of touch with reality.

It's so weird that I actually wouldn't like to support it at all, who cares, Extended Channel is supported too and can do exactly what is needed.

warioishere · 2026-04-06T18:32:44Z

Just curious - what makes Standard Channel feel out of touch for you? The way we see it, Standard Channel + JDC is kind of the whole point. The miner just hashes headers, the JDC proxy does all the heavy lifting (template construction, extranonce management, merkle computation). The miner firmware stays dead simple and you get template control over your own Bitcoin node. That's the core SV2 decentralization use case.

But hey, if Standard Channel is a blocker for merging we're happy to remove it from the SV2 PR and keep it Extended-only. We can always add it later. The nonce space changes in this PR are independent anyway.

cc @GitGab19 @plebhash curious about your thoughts on Standard Channel for small miners / JDC setups

shufps · 2026-04-06T19:33:04Z

Just curious - what makes Standard Channel feel out of touch for you?

Hmm, I think Standard Channel is perfectly fine for very slow devices like NerdMiners. But once you have to abuse ntime as a replacement, hack, or workaround for extranonce2, that feels like a sign that something in the design is off. 😅

It is more or less a coincidence that BM* chips seem to be able to extend their search space beyond their actual chip ID, so this still works in practice. My impression is that other ASICs may not be able to do that as easily, or at all.

A well-thought-out protocol should have taken into account that the available search space can be exhausted very quickly on certain hardware.

So my view is not that Standard Channel is useless — just that it may simply not be the best fit for BM* ASIC miners. 🤷

For that reason I would tend to just ignore it for now and only support Extended Channel what is exactly what we need here.

It would also keep the required SV2 changes less invasive — leaving the core code untouched, avoiding potential regressions, and eliminating any weird special-casing based on which protocol is active.

If there's ever a concrete need to support Standard Channel down the line, future-us can revisit it then.

warioishere · 2026-04-06T19:47:42Z

ntime rolling isn't really a hack – the ntime field is part of the block header,
and consensus rules give a wide window for it.

The SV2 spec explicitly mentions that miners may need to roll ntime when the
search space is exhausted, and that the upstream node should send new jobs
frequently enough based on the miner's hashrate.

Behind a JDC proxy, the search space issue mostly goes away anyway. The JDC
doesn't send templates every 30–60 seconds like pools do – it has triggers
based on mempool fee thresholds that push new templates much more frequently.

So the miner gets fresh Standard Jobs often enough that ntime rolling rarely
kicks in. Its jst a safety measure here that if no template arrives from a jd-client, the miner doesnt run out of search space.

I actually don't know whats wrong about utilizing the full potential of the asics instead of just using endless job creations to compensate a wrongly configured asic. That is what I call a "hack".

But yeah, Extended-only for now works for us. I see I cannot really convince you here. I jst want to push SV2 because it has exeptional advantages over SV1 and that its been to long we have been using an outdated protocol. I have tagged plebhash and gitgab19 the lead devs of SRI/SV2, they maybe explain it better then me that Standard Channels are not only usefull for Nerdminers.

I will disable Standard Channels on the SV2 PR for the meantime.

shufps · 2026-04-06T20:01:05Z

I actually don't know whats wrong about utilizing the full potential of the asics instead of just using endless job creations to compensate a wrongly configured asic. That is what I call a "hack".

Nothing against the approach itself — but all the surrounding code was written around how the ASICs are currently configured, and just changing that has ripple effects. The best example is probably the dual pool scheduler, which is a deterministic scheduler built on a (short, ~500ms) fixed job interval. Changing that interval could lead to weird pool hashrate statistics or problems regulating pool difficulty.

So it's not that the idea is wrong, it's just that the cost of changing it outweighs the benefit for now. It is what it is, sorry 🤷

warioishere · 2026-04-06T20:05:31Z

Just to be clear - the chip ID and HCN changes don't touch the job interval at all. V1 and Extended still send new work every 500ms, the dual pool scheduler runs exactly as before. The only difference is that each chip searches a unique nonce partition instead of all chips searching the same space. No ripple effects on scheduling or pool stats.

The dual pool incompatibility only applies to Standard Channel (which doesn't do 500ms job resends). We already block that combination in the UI - dual pool + standard channel is not selectable.

Anyway, we've disabled Standard Channel on the SV2 PR (#544) for now - both UI hidden and backend forced to Extended. Can revisit later if needed.

shufps · 2026-04-06T20:13:21Z

Just to be clear - the chip ID and HCN changes don't touch the job interval at all. V1 and Extended still send new work every 500ms, the dual pool scheduler runs exactly as before. The only difference is that each chip searches a unique nonce partition instead of all chips searching the same space. No ripple effects on scheduling or pool stats.

Yes, that's what I understood from what you wrote — thanks for clarifying anyway. 👍

My arguments were maybe a bit mixed up. The register 0x10 and chip-ID reluctance is probably more pragmatism than a hard technical blocker — it would require updating log output, nonce histogram and similar things (I think even hashrate register reading) that all build on current assumptions and changing it for BM1366, BM1368 and BM1370 because the FW is used on multiple devices.

The main concern was really Standard Channel and dual pool compatibility, which you've already addressed by disabling it for now. So we're good. 🙂

adammwest · 2026-04-07T12:15:27Z

@warioishere
ah you found an assumption of PR420

PR420 assumes that the address interval is 256/(2**ceil(log2(chain_length)))
reserving the minimum amount of bits for a particular chain length.
I think the functions of PR420 could be abstracted to use address_interval and chips found.

As for the chip interval (as far as my understanding goes)
I think chip address interval reserves nonce space for chips, you used interval=4 in the current code
this means 256/4 = 64 so up to 64 chain length, or reserve 6 bits inside the nonce space/

https://github.com/shufps/ESP-Miner-NerdQAxePlus/blob/develop/components/bm1397/bm1370.cpp#L75

    // set chip address
    for (uint8_t i = 0; i < chip_counter; i++) {
        setChipAddress(i * 4);
    }

but as there are 4 chips in the nerdQAxe++ (I assume chain length 4) so you can claim up to 4 out of 64 reserved

so the nonce range becomes 2^32 / 64 = 2^26 (not in any particular order/bit representation just size)
after reserving, then you claim 4, the result is you get 4 * 2^26 = 2^28 of mineable nonce size or 6.25%

@shufps
for the 0x10 register it is just nonce_percent register that is variable in size
the max value is based on freq of the chip and the address_interval
it is not bounded at 0-100% it can go over 100%, and you get reduced performance.

if you make the register smaller the roll over will be faster because the nonce space is smaller.
make it bigger and the roll over time will be longer.

if you only change freq the roll over time is the same, as the chip frequency changes the max size of 0x10.

The only thing I found out what makes tweaking 0x10 necessary is when the ASIC clock is so high that the search nonce > wraps around in the search space before the version counter is incremented, leading to duplicate shares.
Increasing the VR-frequency fixes it then.
I only observed that by ASIC frequencies of hmm around 1100MHz and higher

This is an interesting observation, according to my understanding the frequency must be bounded as it changes the size of the nonce space proportional to freq, so with 4 chips and a high freq maybe you found an example the limit of the freq. It should have a upper bound.

adammwest · 2026-04-07T12:29:47Z

bitcoin/bips#2116
seems relavent for the standard channel discussion

==Motivation==
BIP 320 defined 16 bits of nVersion as nonce space for additional nonce space. It turns out that
this isn't enough, as some devices have started using 7 bits from nTime for extra nonce space (see
stratum-mining/sv2-spec#187). Given there's limited utility in 16
bits of nVersion space for signaling, instead here we offer 24 bits of nVersion space as extra
nonce space.

==Rationale==
Headers-only mining avoids mining devices (either ASICs or the firmware) from having to concern
themselves with the vast space of consensus logic (handling transactions, merkle trees, etc). It is
widely deployed in ASICs, but requires a substantial number of jobs fed across an entire device,
keeping the ASIC controller busy. Providing additional nonce space for the ASICs to roll without
needing fresh work from the controller may simplify ASIC design somewhat, and as been apparently
adopted in some miners by using extra space in nTime as extra nonce space. Doing so in nVersion
instead is preferable to using nTime

PR420 assumes address_interval = 256 / next_power_of_two(chip_counter) to reserve the minimum bits for nonce space partitioning. For our boards (4, 8 chips) the result is identical but this is correct for non-power-of-two chain lengths. Moved next_power_of_two to asic.h as static inline.

plebhash · 2026-04-07T20:21:03Z

But it’s really weird. It’s like the Standard Channel was invented completely out of touch with reality.

Even though @shufps already retracted this statement, I'll address it first:

I'm not one of the original Sv2 spec authors, but I know that Standard Channels existed in the Sv2 spec since its original draft.

There's a few arguments for the existence of Standard Channels in Sv2 spec:

enable Header-only Mining (HOM) on end-devices while pushing merkle_path+coinbase_tx_prefix+extranonce+coinbase_tx_suffix complexity upstream
smaller network bandwidth consumption due to absence of merkle_path+coinbase_tx_prefix+coinbase_tx_suffix on NewMiningJob (when compared to NewExtendedMiningJob and absence of extranonce on SubmitSharesStandard (when compared to SubmitSharesExtended)
lighter share validation: validators can check shares against a precomputed job merkle_root instead of rebuilding it from merkle_path+coinbase_tx_prefix+extranonce+coinbase_tx_suffix for every share

Of course, there's always going to be tradeoffs in case an Extended Channel is being split into multiple Standard Channels somewhere along the mining stack. Nevertheless, the arguments above still hold in the general sense, even if weaker in such specific cases.

About Version Rolling:

Please note that while NewExtendedMiningJob has a version_rolling_allowed field, NewMiningJob does not. That's because version rolling is implied to always be mandatory on Standard Jobs.

Please let me know whether this is not clear from the spec, because it should be.

If some implementation skips version rolling on Standard Jobs (or doesn't do it to the full extent), then the search space will become smaller than it could have been, and share duplication will happen before job refresh or ntime is increased.

I have the impression that this is where confusion arised, and I'd be happily open to feedback in case anyone thinks we can make this more explicitly clear in the Sv2 spec.

About hard hashrate ceiling:

Although we aim for Sv2 spec to be a canonical document that's "written in stone", it already had to undergo many adjustments over time. So it's not necessarily perfect as-is.

The aspect that's admittedly still a bit unpolished is the hashrate threshold for Header-only Mining (HOM), because 280TH/s is likely going to become somewhat "obsolete" for industrial-scale mining in the near future.

As @adammwest pointed out above, there's efforts to expand the number of rollable version bits, which should raise this threshold beyond 280TH/s and solve this problem:

The alternative approach to expand Standard Job search space is by rolling ntime (as in actual rolling, not just increasing it after 1s has elapsed).

While theoretically possible (as in consensus valid), if applied at scale this approach could have unintended consequences on network difficulty adjustment and IMO should be discouraged in the community: stratum-mining/sv2-spec#187

cc @GitGab19 @plebhash curious about your thoughts on Standard Channel for small miners / JDC setups

Even though I understand why/how @warioishere arrived to this conclusion, I wouldn't necessarily frame Standard Channels as something that's only benefitial to small miners or JDC use-cases.

The range of legitimate use-cases are broader, and could eventually bring real benefits to the industry (reduction in network-bandwidth and compute) if/when applied at scale.

But yeah, there's a few moving parts with regards to Version Rolling and Sv2 spec polishing, which understandably cause confusion.

shufps · 2026-04-08T06:05:31Z

Of course, there's always going to be tradeoffs in case an Extended Channel is being split into multiple Standard Channels somewhere along the mining stack. Nevertheless, the arguments above still hold in the general sense, even if weaker in such specific cases.

Standard Channel is actually something I wanted to have for other things - like some solar powered light-weight LORA miner that shouldn't do any crypto on it's own. The aim always was to just send a header as work load and to just let it mine on that.

There was one puzzle piece left I couldn't answer myself though, it's BM* related. How can I let it mine for longer than 1.5s until it wraps around in the search space.

Sending new headers every 1.5s via LORA is a no-go (not fair use anymore) and sending entier mining.notify is too big (basically same problem).

But the 0x10 register that Adam explained above might be the answer for that. It could make a single ASIC just mine for a couple of minutes when search space has been extended over multiple chip-ID bits in the nonce.

Pleasae don't get me wrong, I don't say Standard Channel is useless - I just have the feeling it might not the best fit for this particular project but might be a game changer for others 😅

And thx a lot of taking your time explaining all of this! 🙌

mutatrum · 2026-04-08T10:46:11Z

The BM1370 can mine for several minutes without a new job, if you extend the full nonce range. The BM1366/BM1368 probably similar? Anything before that will not even get to a second.

plebhash · 2026-04-08T13:45:33Z

There was one puzzle piece left I couldn't answer myself though, it's BM* related. How can I let it mine for longer than 1.5s until it wraps around in the search space.

can't you increase ntime after 1s has elapsed?

that should safely reset the search space, and it's one of the main assumptions behind the 280 TH/s ceiling calculation

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

shufps · 2026-04-08T17:26:05Z

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

It's not a dumb question at all! I thought about too ... but ...

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2) and I'm not convinced that supporting Standard Channel really is worth the effort when Extended just would work 🙊

And I haven't had a deeper look if ntime provides enough to roll for an effective job switching time of ~500ms that would be required to work properly with the dual pool feature.

adammwest · 2026-04-08T17:45:44Z

The BM1370 can mine for several minutes without a new job, if you extend the full nonce range. The BM1366/BM1368 probably similar? Anything before that will not even get to a second.

small detail
1385/87/97 NO
1398 YES I know someone who did it
1362/66/68/70 YES I checked

adammwest · 2026-04-08T18:44:20Z

+
+void Asic::setNonceSpace(float frequency, uint16_t asic_count, uint16_t cores) {
+    int cores_up = next_power_of_two(cores);
+    int asic_count_up = next_power_of_two(asic_count);


This is the part that assumes the chip address interval

int asic_count_up = next_power_of_two(asic_count);

would need to be

int asic_count_up = 256/address_interval;

That would mean you can have any address interval (in theory)

Good catch, fixed - now using 256/m_addressInterval instead of next_power_of_two(asic_count). Testing this now.

Separate observation: with the nonce space changes we see more hardware duplicate shares (~0.19% vs ~0.03% without). We compared the BM1370 init with Bitaxe early-access and tried removing register 0x68 (not present in Bitaxe) and the extra 0xA4 write after setNonceSpace - didn't help.

Any idea what could cause chips to find the same nonce+version more often with wider address intervals? Happens on both V1 and SV2 Extended and Standard so it's not protocol related.

is the 0.19% real
we need thousands of shares to be certain of this,

but
for the gamma there was this possible theorised issue, supposedly the HCN is too big by 268

if you ran with 615Mhz and address_interval is 256/4 = 64
then
100-100* 268/(2^25 * 25/ 615 / 4 * 0.5) = 99.85
0.15 dups
which is close to 0.19%

it should scale worse when hcn_max shrinks
as we have HCN - 268/ HCN_MAX so 268/HCN_MAX = duplicates

if you can test (I dont have the NerdQ device )

address interval = 2 and hcn = hcn_max and freq = 615 and hcn_max is made from `int asic_count_up = 256/address_interval;`

expected is
100-100* 268/(2^25 * 25/ 615 / 128 * 0.5) = 94.97
5% dups

In that case i need to update PR420 aswell

for the gamma case the solution would be to do

// HW errata of 134 per half clock cycle int hcn = hcn_max-268;

The 0.19% is from two NerdQAxe++ devices running side by side for 2 days, both over 50k shares. One with the nonce space patch, one without. Pretty consistent numbers.

Your math lines up almost perfectly with what we see. We'll test with address_interval=2 and hcn=hcn_max at 615MHz to verify the 5% prediction. If that confirms it we'll add the -268 correction.

correction: 0.19% was total rejects. Actual hardware duplicates are ~0.16% (0.19%-0.03% from the devices without the patch. Lines up even closer with your 0.15% calculation. Building the address_interval=2 test now.

Results with address_interval=2, hcn=hcn_max, freq=615MHz, 11h runtime, ~11100 shares: ~1.87% duplicates (1.90% total minus 0.03% baseline from devices without the patch).

That's about a third of your predicted 5%. The errata offset might be smaller than 268, or it scales differently than expected.

update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.

update: the duplicates come in bursts, not evenly distributed. Just jumped from 1.87% to 1.93% after a cluster. > Still climbing slowly. Maybe the overlap only triggers under certain timing conditions, not on every nonce wrap.

Thats expected for a HCN that is too big, these are 2 distinct types of duplicates wrap around when the space ends and restarts and (i call them internal dups) maybe overlapping range duplicates is a better name

but essentially the chip encodes info (core,chip) in some part of the nonce range, the HCN can overwrite this
what you end up with is a portion of the nonce range is overlapping, so you get solutions that appear very close together in time.

imagine a fictious scenario of a chip with 2 cores and a total nonce range of 256
with 128 spacing and we set 130 for the size per core.

Core Start End Range

Core 0 0 130 0 -> 130

Core 1 128 256 128 -> 256

we cover 100% of the range but both cores are assinged the overlapping range 128->130 so we end up with some dups, that come back at the same time approximately.

Thank you for the test!
I will update PR420 I will use 268 to be safe.

you are a genius, I can already say from now, this fixed the problem, no more dups at all, even on standard channels!

warioishere · 2026-04-08T18:52:15Z

(sorry if this is a dumb or uninformed question, I haven't really parsed all the details in this discussion!)

It's not a dumb question at all! I thought about too ... but ...

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2) and I'm not convinced that supporting Standard Channel really is worth the effort when Extended just would work 🙊

And I haven't had a deeper look if ntime provides enough to roll for an effective job switching time of ~500ms that would be required to work properly with the dual pool feature.

I am still not sure if you understood my proposal, Dualpool mode would jst have been not available for Standard channels, when implementing this PR, it still works on Sv1 and Sv2 Extended. Both still use jobtime switching in 500ms interval. Jst standard doesnt use it, we could also had a tooltip explaining this topic to the user.

As adammwest pointed out, the nonce space should be derived from the actual address_interval (256/interval) rather than next_power_of_two(asic_count). This correctly handles any chip address configuration.

shufps · 2026-04-09T08:09:36Z

Dualpool mode would jst have been not available for Standard channels, when implementing this PR

why would we add something that needs disabeling another feature when it's actually not really needed?

I guess we could discuss longer about this without ever coming to consensus 😅

No Standard Channel for now but I'll have a look at this PR and "fix" the nonce space issue (well actually adjust everything else that gets broken by changing the chip IDs - if it's not already been fixed by the PR ofc)

But everything that was about "version rolling frequency" can be removed then. In the web UI too because it's not needed anymore then.

It was there for adjusting the "frequency" so that on the QX there are no duplicates.

But Adam seems to be right and everything I did was BS in this case 😅

warioishere · 2026-04-09T08:26:57Z

Dualpool mode would jst have been not available for Standard channels, when implementing this PR

why would we add something that needs disabeling another feature when it's actually not really needed?

I guess we could discuss longer about this without ever coming to consensus 😅

No Standard Channel for now but I'll have a look at this PR and "fix" the nonce space issue (well actually adjust everything else that gets broken by changing the chip IDs - if it's not already been fixed by the PR ofc)

But everything that was about "version rolling frequency" can be removed then. In the web UI too because it's not needed anymore then.

It was there for adjusting the "frequency" so that on the QX there are no duplicates.

But Adam seems to be right and everything I did was BS in this case 😅

Its anyway still work in progress and maybe a proof-of-concept to get the ASIC working as intented and to help understand the ASIC better. No need to hurry on anything here. As you said, and I am fine with that, extended channels is good for now.

Sjors · 2026-04-09T09:38:29Z

Yes would work but that would be some kind of special case that needs different treatment in the code (rolling ntime instead of enonce2)

Why "instead"? Wouldn't it make sense to unconditionally bump nTime every second? It keeps the timestamp accurate (that's more of an OCD thing than a real requirement of course).

134 per half clock cycle = 268 nonce overlap between adjacent cores. Without correction ~0.15% duplicate shares on 4-chip boards.

warioishere · 2026-04-09T17:15:51Z

@Sjors good point - bumping ntime every second unconditionally makes sense. It's not aggressive rolling, just keeping the timestamp accurate. And it gives a natural job refresh point for all modes.

@shufps this would also solve the dual pool concern for Standard Channel - every second the ntime increments, giving you a natural switching point between pools. No 500ms job resend needed, just pick the right pool on each ntime tick. Don't get me wrong, jst discussing :) I wont change anything on the SV2 PR anymore :)

Remove all VR frequency infrastructure (vrFreqToReg, vrRegToFreq, setVrFrequency, calculateSearchSpaceMs, getDefaultVrFrequency, NVS storage, HTTP API, Web UI) since the HCN-based nonce space calculation in setNonceSpace() replaces it correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

shufps · 2026-04-10T04:30:38Z

Removed everything with version rolling frequency^^

I'll check on the QX with >1100MHz if I see duplicates, if not I would merge this.

edit: interesting, I don't seem to get duplicates at a job time of 10s on an Octaxe and I can confirm that the nonces use the other previously unused chip ID bits in the nonce too, so it really seems the change has extended the search space per chip.

btw, double clicking on the danger zone button makes other edit fields appear like the job interval time (and previously version rolling frequency thing but it was removed now)^^

edit2: Nonce evaluation says all bits in the nonce are now used during mining - this is really nice, love that! 🥰

warioishere · 2026-04-10T05:55:56Z

Before merging - the OCTAXE search space is ~31s at 9 TH/s, ~28s overclocked to 10TH/sec. At your 10s job interval that works, but overclocked devices could still hit duplicates if the job interval exceeds the search space. This affects all modes, not just Standard Channel.

As Sjors suggested we could bump ntime every second unconditionally. That would:

make the search space problem go away for all modes and all hashrates
potentially allow removing the 500ms job switching entirely
but dual pool scheduling would need to be adapted to use ntime ticks instead of job switches

How do you want to proceed - add ntime rolling to this PR and remove job switching, or merge as-is and handle it separately?

shufps · 2026-04-10T10:03:03Z

Before merging - the OCTAXE search space is ~31s at 9 TH/s, ~28s overclocked to 10TH/sec. At your 10s job interval that works, but overclocked devices could still hit duplicates if the job interval exceeds the search space. This affects all modes, not just Standard Channel.

The 10s were just a test to confirm that we use more bits of the nonce than before and we don't generate duplicates.

How do you want to proceed - add ntime rolling to this PR and remove job switching, or merge as-is and handle it separately?

no, this PR is only about fixing the nonce search space.

The other is only for SV2 Extended for now.

ntime rolling, we will see.

shufps · 2026-04-11T06:29:29Z

nice, I didn't see any duplicates on the QX with >=1100MHz.

Have to test the NQ and NerdAxe too because they use BM1368 and BM1366.

But this looks really nice 🥰

edit: NQ+ works too ✔️

shufps · 2026-04-12T06:17:17Z

I guess I'll just merge this and the other one and release a new beta^^

add calculateSearchSpaceMs for dynamic ntime rolling

ec0ad41

Calculates how long the ASIC needs to exhaust the full nonce+version search space. Used to determine when ntime needs to be incremented for Standard Channel on multi-chip boards.

shufps mentioned this pull request Apr 6, 2026

Add Stratum V2 (SV2) protocol support with Noise encryption #544

Merged

7 tasks

warioishere changed the title ~~Draft: Dynamic ASIC nonce space calculation (register 0x10)~~ Draft: control nonce space and timeouts for all all chip topologies Apr 6, 2026

adammwest reviewed Apr 8, 2026

View reviewed changes

use address_interval instead of asic_count for HCN calculation

27e37f0

As adammwest pointed out, the nonce space should be derived from the actual address_interval (256/interval) rather than next_power_of_two(asic_count). This correctly handles any chip address configuration.

fix HCN core nonce overlap: subtract 268 from hcn_max

254f4d2

134 per half clock cycle = 268 nonce overlap between adjacent cores. Without correction ~0.15% duplicate shares on 4-chip boards.

adammwest mentioned this pull request Apr 9, 2026

functions to control nonce space and timeouts for all chip topologies bitaxeorg/ESP-Miner#420

Open

shufps marked this pull request as ready for review April 12, 2026 06:17

shufps merged commit 784cae0 into shufps:develop Apr 12, 2026

Conversation

warioishere commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Open question: Register 0x10 conflict

Test plan

Uh oh!

shufps commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shufps commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 5, 2026

Uh oh!

shufps commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shufps commented Apr 5, 2026

Uh oh!

warioishere commented Apr 5, 2026

Uh oh!

mutatrum commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results (NerdQAxe++, 4x BM1370 @ 615MHz, SV2 Standard Channel)

Test 1: HCN on register 0x10, original chip IDs (0,4,8,12)

Test 2: Same as Test 1 + disabled checkVrFrequencyChanged (was overwriting register 0x10)

Test 3: HCN + Bitaxe-style chip IDs (0,64,128,192) + updated nonce-to-ASIC mapping

Open questions

VR frequency feature

NerdOCTAXE and NerdQaxe++ needs ntime rolling

Uh oh!

warioishere commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shufps commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 6, 2026

Uh oh!

shufps commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 6, 2026

Uh oh!

shufps commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 6, 2026

Uh oh!

shufps commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 6, 2026

Uh oh!

shufps commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adammwest commented Apr 7, 2026

Uh oh!

adammwest commented Apr 7, 2026

Uh oh!

plebhash commented Apr 7, 2026

Uh oh!

shufps commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

warioishere commented Apr 4, 2026 •

edited

Loading

shufps commented Apr 5, 2026 •

edited

Loading

warioishere commented Apr 5, 2026 •

edited

Loading

warioishere commented Apr 5, 2026 •

edited

Loading

shufps commented Apr 5, 2026 •

edited

Loading

shufps commented Apr 5, 2026 •

edited

Loading

warioishere commented Apr 5, 2026 •

edited

Loading

mutatrum commented Apr 5, 2026 •

edited

Loading

warioishere commented Apr 5, 2026 •

edited

Loading

Test 2: Same as Test 1 + disabled `checkVrFrequencyChanged` (was overwriting register 0x10)

warioishere commented Apr 5, 2026 •

edited

Loading

shufps commented Apr 6, 2026 •

edited

Loading

shufps commented Apr 6, 2026 •

edited

Loading

shufps commented Apr 6, 2026 •

edited

Loading

shufps commented Apr 6, 2026 •

edited

Loading

shufps commented Apr 6, 2026 •

edited

Loading

shufps commented Apr 8, 2026 •

edited

Loading

plebhash commented Apr 8, 2026 •

edited

Loading

shufps commented Apr 8, 2026 •

edited

Loading

warioishere Apr 8, 2026 •

edited

Loading

adammwest Apr 8, 2026 •

edited

Loading

warioishere Apr 8, 2026 •

edited

Loading

warioishere Apr 8, 2026 •

edited

Loading

warioishere commented Apr 8, 2026 •

edited

Loading

shufps commented Apr 9, 2026 •

edited

Loading

warioishere commented Apr 9, 2026 •

edited

Loading

warioishere commented Apr 9, 2026 •

edited

Loading

shufps commented Apr 10, 2026 •

edited

Loading

warioishere commented Apr 10, 2026 •

edited

Loading